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Abstract 

We present a new deterministic algorithm for the sparse Fourier trans- 
form problem, in which we seek to identify k ^ N significant Fourier coef- 
ficients from a signal of bandwidth A'^. Previous deterministic algorithms 
exhibit quadratic runtime scaling, while our algorithm scales linearly with 
k in the average case. Underlying our algorithm are a few simple observa- 
tions relating the Fourier coefficients of time-shifted samples to unshifted 
samples of the input function. This allows us to detect when aliasing be- 
tween two or more frequencies has occurred, as well as to determine the 
value of unaliased frequencies. We show that empirically our algorithm is 
orders of magnitude faster than competing algorithms. 

1 Introduction 

The Fast Fourier Transform (FFT) is arguably the most ubiquitous numerical 
algorithm in scientific computing. In addition to being named one of the "Top 
Ten Algorithms" of the past century |DS00j , the FFT is a critical tool in myriad 
applications, ranging from signal processing to computational PDF and machine 
learning. At the time of its introduction, it represented a major leap forward in 
the size of problems that could be solved on available hardware, as it reduces 
the runtime complexity of computing the Discrete Fourier Transform (DFT) of 
a length- TV array from 0(7V2) to O(A^logiV). 

Any algorithm which computes all N Fourier coefficients has a runtime com- 
plexity of n{N), since it takes that much time merely to report the output. How- 
ever, in many applications it is known that the DFT of the signal of interest is 
highly sparse - that is, only a small number of coefficients are non-zero. In this 
case it is possible to break the n{N) barrier by asking only for the largest k terms 
in the signal's DFT. When k <^ N existing algorithms can significantly outper- 
form even highly optimized FFT implementations (IGS07[ HwelOi IHIKP12b| . 

1.1 Related Work 

The first works to implicitly address the sparse approximate DFT problem ap- 
peared in the theoretical computer science literature in the early 1990s. In jLMN93| . 



a variant of the Fourier transform for Boolean functions was shown to have ap- 
plications for learnability. A polynomial-time algorithm to find large coefficients 
in this basis was given in |KM93| , while the interpolation of sparse polynomials 
over finite fields was considered in |Man95j . It was later realized |GMS05) that 
this last algorithm could be considered as an approximate DFT for the special 
case when TV is a power of two. 

In the past ten or so years, a number of algorithms have appeared which 
directly address the problem of computing sparse approximate Fourier trans- 
forms. When comparing the results in the literature, care must be taken to 
identify the class of signals over which a specific algorithm is to perform, as well 
as to identify the error bounds of a given method. Different algorithms have 
been devised in different research communities, and so have varying assumptions 
on the underlying signals as well as different levels of acceptable error. 

The first result with sub-linear runtime and sampling requirements appeared 
m |GGI+02| . They give apoly(fc,log7V, log(l/i5), 1/e) time algorithm for finding, 
with probability 1 — S, an approximation y of the DFT of the input x that 
is nearly optimal, in the sense that ||a; — y||2 < {1 + e)\\x — ioptHii where 
Xopt is the best fc-term approximation to x. Here the exponent of k in the 
runtime is two, so the algorithm is quadratic in the sparsity. Moreover, the 
algorithm is non-adaptive in the sense that the samples used are independent of 
the input x. This algorithm was modified in |GMS05| to bring the dependence 
on k down to linear This was accomplished mainly by replacing uniform 
random variables (used to sample the input) by random arithmetic progressions, 
which allowed the use of nonequispaced fast Fourier transforms to sample from 
intermediate representations and to estimate the coefficients in near-linear time. 
The increased overhead of this procedure, however, limited the range of k for 
which the algorithm outperformed a standard FFT implementation |IGS07| . 

Around the same time, a similar algorithm was developed in the context of 
list decoding for proving hard-core predicates for one-w ay functi ons |AGS03] . 
This can be considered an extension of |KM93| . and like |GGI+02[[GMS05] is a 
randomized algorithm. Since the goal in this work was to give a polynomial-time 
algorithm for list decoding, no effort was made to optimize the dependence on 
k; it stands at fc"/^, considerably higher than |GGI+02[[GMS05] . The random- 
ness in this algorithm is used only to construct a sample set on which norms 
are estimated, and in |AkalOj this set is replaced with a deterministic construc- 
tion. This construction is based on the notion of e-approximating the uniform 
distribution over arithmetic progressions, and relies on existing constructions 
of e-biased sets of small size |Kat891 lAIK+90 . Depending on the size of the 
e-biased sets used, the sampling and runtime complexities are 0{k'^ log'^ N) and 
0(fc^ log'^ iV), respectively, for some c > 4|^ 

^See iGSTOSi for a "user-friendly" description of the improved algorithm. 

^ Specifically, the runtime is O (fc^ ■ log A' ■ 1 51 ) , where S is the set of samples read by the algo- 
rithm. This set takes the form S = Uf^i ^4 — 6^, where A has e-discrepancy on rank 2 Bohr 
sets, Bf e-approximates the uniform distribution on [0, 2* — 1] nZ, and A — Bi is the difference 
set. Using constructions from fKat89 one has \A\ = 0(£~^ log* iV), \Bt\ = 0(£~^ log'^ N); 
setting e = e(A:-i) and noting that HJ-4 - Bi\ = 0{J2\^~ and \A - Bi\ = 0{|^||B^|) 
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In the series of works |Iwe08[ llwelOi llwellj . a different deterministic al- 
gorithm for sparse Fourier approximation was given that rehes on the combi- 
natorial properties of aliasing, or collisions among frequencies in sub-sampled 
DFTs. By taking enough short DFTs of co-prime lengths, and employing the 
Chinese Remainder Theorem to reconstruct energetic frequencies from their 
residues modulo these sample lengths, the author is able to prove sampling and 
runtime bounds of 0(fc^ log'* A^). The error bound is of the form \\x — y\\2 < 
— ^opt||2 + A:~^/^||i — ioptlli; it has been shown that the stronger "€2-^2" guar- 
antee of |GMS05j cannot hold for a sub-linear, deterministic algorithm jCDD09j . 
Moreover, the range of k for which this algorithm is faster than the FFT is 
smaller in practice than that of |GMS05) . 

Most recently, the authors of |HIKP12b) presented a randomized algorithm 
that extends by an order of magnitude the range of sparsity for which it is 
faster than the FFT. This is accomplished by removing the iterative aspect from 
[GMSOSj by using more efficient filters, which are nearly flat within the passband 
and which decay exponentially outside. In contrast, the box-car filters used in 
|GMS05j have a frequency response which oscillates and decays like |w|~*. In 
addition, the identification of significant frequencies is done by direct estimation 
after hashing into a large number of bins rather than the binary search technique 
of (GMSOSj . These changes give a runtime bound of O (log Ny/Nk log N) and 
a somewhat stronger error bound \\x — y|j^ < efc^-'^||i; — a;opt||2 + '^ll^lli with 
probability 1 — where e > and S — TV^'-'^^) is a precision parameter. 

These existing algorithms g enerally ta ke one of two approaches to the sparse 
Fourier transform problem. In |GGI+02[ EGSOSl K^MSOSl IHIKP12bj . the spec- 
trum of the input is randomly permuted and then run through a low-pass filter to 
isolate and identify frequencies which carry a large fraction of the signal's energy. 
This leads to randomized algorithms that fail on a non-negligible set of possible 
inputs. On the other hand, [IwelOj takes advantage of the combinatorial prop- 
erties of aliasing in order to identify the significant frequencies. This leads to 
a deterministic algorithm with higher runtime and sampling requirements than 
the randomized algorithms mentioned. Both of these randomized and determin- 
istic approaches have drawbacks. Randomized algorithms are not suitable for 
failure-intolerant applications, while the process used to reconstruct significant 
frequencies in |IwelO| relies on the Chinese Remainder Theorem (CRT), which 
is highly unstable to errors in the residues. While there do exist algorithms for 
"noisy Chinese Remaindering" |GRSOO[ IBon02[ [5S04] these have thus far not 
found application to the sparse DFT problem, and we leave this as future work. 

As this paper was being prepared, the authors became aware of an indepen- 
dent work using very similar methods for frequency estimation in the noiseless 
case |HIKP12a] . Both methods consider the phase difference between Fourier 
samples to extract frequency information, but are based on different techniques 
for binning significant frequencies. The authors of |IIIKP12a] use random dila- 
tions and efficient filters of [II IKP12b] . whereas we use different sample lengths 
in the spirit of [IwelOj . We believe both contributions are of interest, and rein- 

(see, e.g., ITVOGI ) one obtains the stated sampling and runtime complexities. 
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force the notion that exploiting phase information is critical for developing fast, 
robust algorithms for the sparse Fourier transform problem. 

1.2 New Results 

In this paper we describe a simple, deterministic algorithm that avoids recon- 
struction with the CRT. We are thus able to avoid two pitfalls associated with 
existing algorithms. Our method relies on sampling the signal in the time do- 
main at slightly shifted points, and thus it assumes access to an underlying 
continuous-time signal. The shifted time samples allow us to determine the 
value of significant frequencies in sub-sampled FFTs and also indicate when 
two or more frequencies have been aliased in such a sub-sampled FFT. These 
two key facts allow us to significantly reduce (by up to two orders of magni- 
tude) the average-case sampling and runtime complexity of the sparse FFT over 
a certain class of random signals. Our worst-case bounds improve by a constant 
factor those of prior deterministic algorithms. We present both adaptive and 
non-adaptive versions of our algorithms. If the application allows samples to 
be acquired adaptively (that is, dependent on previous samples), we are able to 
improve further on our average-case bounds. 

The remainder of this paper is organized as follows. In section[2]we introduce 
notation and prove the technical lemmas underlying our algorithms. In section 
[3] we introduce randomized and deterministic versions of our algorithm. In 
section [4] we prove that our algorithm has average-case runtime and sampling 
complexities of 0(/clog(fc)) and 0(fc), respectively. In section [s] we present the 
results of an empirical evaluation of our algorithm and compare its runtime and 
sampling requirements to competing algorithms. Finally in section[6]we provide 
some concluding remarks and discuss ongoing work to appear in the future. 

2 Mathematical Background 
2.1 Preliminaries 

Throughout this work we shall be concerned with frequency-sparse band-limited 
signals S : [0, 1) — > C of the form 

k 

5(t)=^a,e2-.*, (1) 
j=i 

where ujj G [-N/2,N/2) n Z, aj G C, and k N. The Fourier series of S is 
given by 

S{uj) = / S'(t)e-2'^'"*dt, a; e Z, (2) 
Jo 

so that for signals of the form ([I]) we have S{ujj) = aj and S{(jj) ~ for all other 
uj £ [—N/2, N/2)r\Z. Given any finite sequence S = (sq, si, . . . , Sp_i) of length 
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p we define its Discrete Fourier transform (DFT) by 



S[h] ^ ^.,e^ = E-^t^Wp'' (3) 



where /i = 0, 1, . . . ,p — 1, S[j] :— sj and Wp := is the primitive p-th root 

of unity. The Fast Fourier Transform (FFT) allows the computation of S in 
O(plogp) steps. 

We apply the DFT to discrete samples of S{t) to compute the Fourier coef- 
ficients Qj of S{t). For an integer p and real e > we form discrete arrays of 
samples of S of length p via 

Sp[j]^s[^), SpAj]^s[^+e), j=0,l,...,p-l. 

Now assume that all ujj (mod p), 1 < j < k are distinct. It is a simple derivation 
to obtain 

paj h = u)j (mod p) 
otherwise. 



Sp[h] 



By examining the peaks of Sp [h] we will be able to determine {a;^ (mod p) : 
1 ^ J < fc}- Previous approaches applied the Chinese Remainder Theorem to 
reconstruct {uJj} by taking a suitable number of p's, which must overcome the 
problem of registrations to match up each tUj whenever a new p is used (see 
e.g. |IwelO[|Iwe llJ). Our algorithm takes a different approach using the shifted 
sub-samples. Note that 



Sp,e[h] 



pUjC^^^^'^^ h = u}j (mod p) 
otherwise. 



It follows that in this setting, for h = ujj (mod p) we have = Q^-Kieuj _ 
Hence 

27r£a., ^ Arg ( %^ I (mod 27r), (4) 

where Arg(2:) denotes the phase angle of the complex number z in [— 7r,7r). 
Assume that we take |e| < jj. Then ujj is completely determined by (jij) as 
there will be no wrap-around aliasing, and 

• - ' A,gf|#V (5) 



27re 



Sp[h] 



In fact, more generally, if we have an estimate of Wj, say \ujj\ < ^, then by 
taking |e| < the same reconstruction formula ([5| holds. The observation 
that by taking slightly shifted samples will allow us to identify frequencies in 
S{t) underlies the algorithms which follow, and the bulk of this paper analyzes 
various aspects of the proposed algorithms, such as efficiency and robustness. 



5 



One of the problems is that when p < iV it is possible that two or more 
distinct frequencies will have the same remainder modulo p. In this case we say 
the frequencies are aliased or collide (mod p). In general, for h G {0, . . . — 1} 
and the given signal S{t) let I(S, h;p) :— {j : ujj = h (mod p)}. Then we have 

Sp[h]= S{uj)=p aj. (6) 

uj=h {mod p) j^I(S,h-^p) 

When aliasing occurs reconstruction via ^ is no longer valid. The aliasing 
phenomenon presents a serious challenge for any method with sub-linear sam- 
pling complexity. In the next section we develop a simple test to determine 
whether or not aliasing has occurred in a p-length DFT, which then allows us 
to effectively overcome this challenge and develop provably correct sub-linear 
algorithms. 



2.2 Technical Lemmas 

To effectively apply the sub-sample idea in a Fourier algorithm one must first 
overcome the aliasing challenge. Using shifted sub-samples gives us a simple 
yet extremely effective criterion to determine whether or not aliasing has oc- 
curred at a given location in a p-length DFT without resorting to complicated 
combinatorial techniques. Observe that complementing ([6]) we have 



Sp.elh] 



E 



j£l{S,h-p) 



It follows that 



(7) 



p^ E 

3,l£l(SJi;p) 



27:ie{ujj—uJi ) 



(8) 



p 



Lemma 2.1. Letp > 1 and h G {0,1, . . . ,p— 1}. Assume that q = \I{S, h;p)\ > 
1, i.e. LOj = h (mod p) for more than one j in S{t). Then we have the following: 



(A) Let s > and E := {uJj 



j,m e I{S,h;p)}. Suppose that all 



elements of eE are distinct (mod 1). Then \ Sp^rn£[h] \ 7^ for some 

1 < m < ~ q. 

(B) For almost all e > we have \Sp^^[h]\ ^ |Sp[/i]|. 

Proof. The proof of part (B) is immediate from | 



Observe that /(e) := 

is trigonometric polynomial in e, and it is not identically 
I{S,h;p)\ > 1. Thus it has at most finitely many zeros for 



Sp,,[h] - Sp[h] 
given that q = 
e E [0, 1), and hence (B) is clearly true. 
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We resort to the Vandermonde matrix to prove part (A). For simplicity we 
write f{t) = J2aeE^a<3'^^^°'*- Set := e^'^'"'^ where e satisfies the hypothesis 
of the lemma, which implies that all rj are distinct. Assume the claim of part 
(A) is false. Then we have f{me) = for all < to < g'^ — g. Here /(O) = is 
automatic because Sps) = Sp. Thus we have 

^c„C=0, m-0,l,...,g2_q. (9) 

But the cardinality of E is at most — q + 1, which means that there are at 
most — g + 1 terms in the sum in Because all Ta are distinct the matrix 
[r™] is a nonsingular Vandermonde matrix, and for ^ to hold all Cq must be 
zero. This is clearly not the case, and a contradiction. □ 



Remark. Any irrational e or e = j with a, b coprime and h > 2N will 



satisfy the hypothesis of part (A) of Lemma 2.1 It is also easy to show that in 
the special case where all coefficients Oj are real and \I{S,h\p)\ — 2, we have 
/ PpI'^II s-'^y ^ — T, with a, b coprime and b> N. 

Lemm a |2.1| allow us to determine whether aliasing has occurred by whether 
I Sp^g [/i] I / 1 [ft.] I — 1 for a few values of e. It offers both a deterministic (part 
(B)) and a random (part (A)) procedure to identify aliasing in the sub-sampled 
DFTs. In practice we need to set a tolerance r in order to accept or reject 
frequencies according to the criterion 



Sp[h] 



< T. 



(10) 



We typically choose e 



1/cN for some small constant c > 2, which would 
A tolerance on the order of 



satisfy the hypothesis of part (A) of Lemma 2.1 
p/N works well in general, which is what we use in our experiments in section [5] 
below. 

In our algorithms we will take a number of sub-sampled DFTs of an input 
signal S{t) of the form ([T]), whose lengths we denote pi. Lemma 2.1 allows us 



to determine whether or not two or more frequencies are aliased, so that we 
only add the non-aliased term to our representation. Since it is unlikely that 
two or more frequencies are aliased modulo two different sampling rates, using 
a different pi in a subsequent iteration lets us quickly discover all frequencies 
present in S{t). Lemma 2.3 gives a worst-case bound on the number of 's 



required by our deterministic algorithm to identify all k frequencies in a given 
Fourier-sparse signal. It is similar to [IwelO[ Lemma 1], but with a smaller 
constant. In its proof we use the CRT, which we quote here for completeness 
(see, e.g., |NZM91p . 

Theorem 2.2 (Chinese Remainder Theorem). Any integer n is uniquely spec- 
ified modulo N by its remainders modulo m pairwise relatively prime numbers 
pi, provided YVeLiPi ^ ^• 
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Lemma 2.3. Let M > 1. It suffices to take 1 + {k — l)[log^ A''] pairwise 
relatively prime pi 's with pi > M to ensure that each frequency LUj is isolated 
(i.e. not aliased) (mod pi) for at least one £. 

Proof. Assume otherwise, namely that given pi for i = 1,2, ... ,L with L > 
fc[logj;,j A^J there exists some ujj such that ujj is ahased (mod pe). By the Pigeon 
Hole Principle there exists at least one uj^ ^ ujj such that loj —Um = (mod pi) 
at least q times, where q > [logj,/ A^J . Without loss of generality we assume 
that ojj — uj„i = (mod pe) ior £ = 1,2, ... ,q. Now by the fact that pi> M we 
have 

9 

Jl P£ > > N. 

e=i 

By the CRT we would then have coj = u)„i (mod N), a contradiction. □ 

We remark that the algorithm in |IwelO| requires taking 1 + 2k log^. N co- 
prime sample lengths, since that algorithm requires each w to be isolated in at 
least half of the DFTs of length p^. This requirement stems from the fact that 
that algorithm cannot distinguish between aliased and non-aliased frequencies 
in a given sub-sampled DFT. Our worst-case bound is approximately a factor of 
two better, though in practice our algorithms never use all those sample lengths 
on random input. The fact that we can tell which frequencies are "good" for a 
given p( allows us to construct our Fourier representation one term at a time, 
and quit when we have achieved a prescribed stopping criterion. 

3 Algorithms 

Both of our algorithms proceed along a similar course; in fact they differ only 
in the choice of the sample lengths p^. We assume that we are given access 
to the continuous-time signal S(t) whose Fourier coefficients we would like to 
determine, and further that we can sample from 5* at arbitrary points t in unit 
time. This is an appropriate model for analog signals, but not for discrete 
ones. In the discrete case, one could interpolate between given samples to 
approximate the required S'- values, though we have not implemented or analyzed 
this case. (The same assumptions hold for the algorithms in |IwelO| . while those 
in [GGI+021 IGMS051 IHIKP12bj are formulated purely in the discrete realm.) 
In this paper mainly limit ourselves to the noiseless case. Though this is a 
highly unrealistic assumption, it permits a simple description of the underlying 
algorithm. In section [33| we discuss some of the problems associated with noisy 
signals and give a minor modification of our algorithm for low-level noise. A 
second manuscript in preparation addresses the issue of noise specifically, with 
more significant modifications to the algorithms described below. 

3.1 Non-adaptive 

Our algorithms start by choosing a sample length pi such that j)i > ck for 
some constant c > 1. For a fixed e < l/N, we then compute Sp and ■S'p^e, 
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sort the results by magnitude, and compute frequencies lo via ^ for the k 
largest coefficients in absolute value. We then check whether or not each of 
those frequencies is aliased via ([6]), and if it is not, we add it to our list. The 
coefficient is given by the unshifted sample value Sp [h] at that frequency. After 
this, we combine terms with the same frequency and prune small coefficients 
from the list. We then iterate until a stopping criterion is reached. In the 
empirical study described in section [5] we stopped when the number of distinct 
frequencies in our list equalled the desired sparsity. 

Our deterministic algorithm chooses pi to be the prime greater than ck. 
This ensures that all samples lengths are co-prime, at the expense of taking 



slightly more samples than necessary. By Lemma 2.3 1 + (fc — 1) [log^j. N\ such 



piS suffice to isolate every uj at least once. This gives us worst-case sampling and 
runtime complexity on the same order as |IwelO) . though the results in section [s] 
indicate that on average we significantly outperform those pessimistic bounds. 

Our Las Vegas algorithm chooses pi uniformly at random from the interval 
[ci/c,C2fc] for constants 1 < ci < C2. In this case we cannot make a worst-case 
guarantee on the number of iterations needed by the algorithm to converge. 
However, the results in section [5] indicate that the Las Vegas version performs 
similarly to the deterministic version on the class of signals tested. 



3.2 Adaptive 

The algorithms can also be implemented in an adaptive fashion, by which we 
mean that the size of the current representation is taken into account in sub- 
sequent iterations. In particular, if R is our current representation, we let 
k* — k — \R\ and choose the next pg with respect to k* instead of k. Moreover, 
before taking DFTs, we subtract off the contribution from the current repre- 
sentation, so that effort is not expended re-identifying portions of the spectrum 
already discovered. This idea is similar to that in |GGI"'"02l IGMS05] . though 
in our empirical studies the evaluation of the representation is done directly, 
rather than as an unequally-spaced FFT. This gives our algorithms asymptot- 
ically slower runtime, but the effect is negligible for the values of k studied in 
section [5] A formal description appears below in algorithm [l] 



3.3 Modifications in the presence of noise 

In the noiseless versions of the algorithms described in this paper, a test for 
aliasing is implemented by considering the ratio of magnitudes of shifted and 
unshifted peaks. When the samples are corrupted by noise, there will be two 
challenges. The first challenge is that the reconstruction of frequencies from 
shifts will be corrupted by noise. The second challenge is that there will be 
variations among the magnitudes even for non-aliased terms, so a higher thresh- 
old that depends on the size of the noise must be set. When this threshold is 
too large it affects the ability to distinguish aliased terms as there will be an 
increased number of false negatives. On the other hand, lower thresholds that 
reduce false negatives will lead to an increased number of false positives. 
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Algorithm 1 Phaseshift 



Input: function pointer S, integers ci,C2j^k, N, real e 
Output: R, a sparse representation for S 

0, eo 0, £i ^ £, ^ 1 
while \R\ < k do 
5: k*^k-\R\ 

Pe first prime > cik* 



10: 



15: 



20: 



25: 



for m = to 1 do 
for j = to ^ - 1 do 

j 



{or k if non-adaptive} 
{or Uniform (cifc*,C2fc*) if Las Vegas} 



Se 



rep[j] ^ J2 



2miA}{j/pe+Sm) 



end for 



FFT{Se^rn — Srep) 

- SORT(5tm) 

for j = 1 to k* do 

1 . fs!T[j] 



osort 



27r£ 

end for 



Arg 



end for 

for j = 1 to k* do 



if 



SITU] 



- 1 



< — then 

N 



R<- RU 
end if 
end for 

collect terms in R with same w 
prune small coefficients from R 

end while 



{omit if non-adaptive} 
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The first challenge can be addressed effectively through a combination of 
using larger Pj's, multiple shifts and a multiscale unwrapping. The idea of 
using larger pj's is rather straightforward yet effective. For any given pj the 
DFT detects the location of the frequencies modulo pj rather accurately even 
with substantial noise. Furthermore, the reconstructed frequencies will still 
tend to cluster around the true value. Suppose that we sample the signal and 
compute DFTs of length pj on these samples. The locations of the peaks in these 
short DFTs tell us the accurate value of uj mod pj for each unaliased frequency 
ui appearing in the signal. Writing lu = apj + b with a,b G Z, we now know b 
and must determine a. 

With a small amount of noise the reconstructed frequencies uj using ^ will 
be close to the true ui. We can thus round oj to the nearest integer of the form 
apj + b, which will recover the true frequency w as long as |w — w| < Pj/2. For 
high noise levels, it is possible that the ui will deviate by more than Pj/2 from uj, 
so that the value for a given by rounding will be incorrect. By choosing larger 
Pj (i.e., increasing the parameter ci) one can alleviate the problem somewhat, 
provided that the noise level is not too high. When the noise level is so high 
that taking a large pj is no longer economical, a potential solution is to take 
multiple shifts and employ a multiscale unwrapping technique. We are still at 
the preliminary stage in our study of these new techniques, but early results are 
very encouraging. 

The second challenge poses a bigger problem, but again it can be addressed 
in several ways. The multiscale unwrapping method will repeatedly check for 
aliasing at each stage, which makes it highly unlikely that an aliased frequency 
will pass through all the tests. Even in the unlikely even that it does, our 
algorithm allows false positives. Since each mode is subtracted from the original 
signal in our algorithm, a false positive frequency will lead to an extra mode in 
the new signal. As the process continues it will be extracted and cancel out the 
false frequency extracted earlier. 



4 Average-case analysis 

In this section we prove that the average-case runtime and sampling complexity 
of our algorithm are Q{klogk) and 6(fc), respectively. This is shown over a 
class of random signals described in section [42} Before giving this result on the 
expected runtime and sampling complexity, in section [4. 1| estimate the costs of 
a single iteration of the while loop in algorithm [TJ lines 5-25. We then describe 



in section 4.2 the random signal model over which we prove our average-case 



bounds. In section 4.3 we prove that the expected number of iterations of the 
while loop is constant, and in section [44| we use this result to prove our average- 
case bounds. 
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4.1 While loop runtime and sampling complexity 

The computational cost of the while loop in algorithm[T] lines 5-25 is dominated 
by three operations. The first is the evaluation of the current representation R 
oi k — k* terms at the 0(fc*) points j/pi in line 10. In our implementation, 
we simply calculated this directly, looping over both the sample points and the 
terms in the representation. The complexity of this implementation is 0{pi{k — 
k*)) = 0{k* (k—k*)) = 0{k^), and while non-equispaced fast Fourier transforms 
|DR931 RD96j yield an asymptotically faster runtime of 0(A:log(fc)), they also 
incur large overhead costs. For the values of k considered in this paper, the 
direct evaluation seems to have little effect on the overall runtime. The other 
two dominant computational tasks in the inner loop are the FFTs of 0(A;) 
samples and the subsequent sorting of these DFT coefhcients. It is well-known 
that both of these operations can be done in time 9(A:log(fc)) jCLRSOlj . Thus 
the inner loop has overall time complexity 0(fclog(/c)), assuming the use of 
non-equispaced FFTs. 

4.2 Random signal model 

For both the average-case analysis and for the empirical evaluation described 
in section [5] we considered test signals with uniformly random phase over the 
bandwidth and coefficients chosen uniformly from the complex unit circle. In 
other words, given k and N, we choose k frequencies ujj uniformly at random 
(without replacement) from [—N/2, N/2) n Z. The corresponding Fourier coef- 
ficients Qj are of the form e^^'^J , where 9j is drawn uniformly from [0, 1). The 
signal is then given by 

k 

^^a^e^"'"^*. (11) 

This is the standard signal model considered in previous empirical evaluations 
of sub-linear Fourier algorithms |IGS07[ llwelOi lHIKPT2b) . 

4.3 Markov Analysis of Collisions 

In order to analyze the expected runtime and sampling complexity of our algo- 
rithms, we must estimate the expected number of collisions among frequencies 
modulo the sample lengths used by the algorithms. Recall that in the noiseless 
case, our algorithms are able to detect when a collision between two or more 
frequencies has occurred, and for those that are not aliased we are able to calcu- 
late the value of the frequency. Thus we seek to estimate the expected fraction 
of frequencies that are aliased modulo a given sample length p, since this deter- 
mines how many passes the algorithm makes. In this section we derive bounds 
on the expected value of this quantity and discuss how the stopping criteria 
used in the algorithm affect its average-case performance. 

In the random signal model considered in section [5j we assume the k fre- 
quencies are uniformly distributed over the bandwidth [—N/2, N/2), and so the 
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residues uj modp are also uniformly distributed over [0,p — 1]. Our problem 
then becomes a classical occupancy problem: the number of collisions among 
the frequencies is equivalent to the number of multiple-occupancy bins when k 
balls are thrown uniformly at random into p bins. Define X,„ to be the num- 
ber of single-occupancy bins after m balls are thrown, to be the number of 
multiple-occupancy bins after m balls are thrown, and Zm to be the number of 
zero-occupancy bins after m balls are thrown. Since p is constant, we have the 
trivial relationship Zm = p — Xm — Ym, so it suffices to consider only the pair 
{Xjn, Ym)- When the (m-l-l)*'* ball is thrown, we have the following possibilities: 

• it lands in an unoccupied bucket, with probability Z^/p = 1 — {X^ + 
Y^)/p; 

• it lands in a single-occupancy bucket, with probability Xm/p', 

• it lands in a multiple-occupancy bucket, with probability Y^/p- 



In the first case, we have Xm+i — X„i 
we have X^+i = X^ - 1, Y^+i = Y„^ 



1, Yra+i ~ Ym] in the second case, 
f 1; and in the third case, we have 



X 



Y V V 

E 



Conditioning on the values of X„, Y^ we have 





Xm+1 






Xm 




( 


Ym+1 






Y 





1-2/p -l/p' 








■ 1 " 


l/p 1 




Y 


-t- 






(12) 



so that the system forms a Markov chain. By recursively conditioning on the 
values of Xm-i, Y„i-i, we can calculate the expected values of X^, Yk for any 
A; > using the initial condition Xi = 1, Yi = 0. Denoting by A the matrix in 



the right-hand side of equation (12), we have 



E 



Xfe 
Yk 



'fc-1 




(13) 



Since p{A) = 1 — l/p < 1, where p is the spectral radius, the geometric matrix 
series can be written 



^ A™ = {I - A)-^{I - A 

m— 

After some linear algebra, we obtain 
E 



Xfc 
Yk 



p{l-{l-lf)^k{l-lf-^ 



(14) 



(15) 



Since Zk=p- Xk- Yk, we have E {Zu) = p{l - 1/p)^- 

In our algorithms we choose p = ck for some small integer c. Using this and 
the approximation (1 -I- -)" e^, we have 



E 



Xk 
Yk 



cfc(l - e-i/'^) - /ce-i/" 



(16) 
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This gives a nonlinear equation for the expected number of colhsions among k 
frequencies as a function of the parameter c. Newton's method can then be used 
to determine the value c required to ensure a desired fraction of the frequencies 
are not aliased. For example, to ensure that 90% of frequencies are isolated on 
average, it suffices to take c = 5; this value for the parameter c had already been 
found to give good performance in our empirical evaluation of the algorithms. 



4.4 Average-case runtime and sampling complexity 

In this section we will use a probabilistic recurrence relation due to Karp |Kar94[ 
IDP09| to give average-case performance bounds and concentration results for 
the case when the algorithm is halted after identifying k or more terms. In 
particular, we use the following theorem for recurrences of the form 

T{k) = a{k)+T{H{k)), (17) 

where T(k) denotes the time required to solve an instance of size k, a{k) is the 
amount of work done on a problem of size k, and < H{k) < k is a, random 
variable denoting the size of the subproblem generated by the algorithm. 

Theorem 4.1. \Kar94\ Theorem 1.2] Suppose aik) is nondecreasing, continu- 
ous, and strictly increasing on {x : a{x) > 0}, and that E[iJ(fc)] < m(fc) for a 
nondecreasing continuous function m{k) such thatm{k)/k is also nondecreasing. 
Denote by u{k) the solution to the deterministic recurrence 

u{k) ^ a{k) +uim{k)). (18) 

Then for k > and t eN, 



m{k) 



P[T(fc) > u{k) + ta{k)] < y-^j ■ (19) 
Our algorithm does work a(fc) = 0(fclog(/c)) on input of size k and generates 



a subproblem whose average size is m(fc) = fc/10. (Recall from section 4.3 that 
with the parameter c = 5, on average over 90% of the frequencies were not 
aliased modulo p = 0{ck).) The associated deterministic recurrence is then 

M(fc) = e(fclog(fc)) +u(/c/10), (20) 

whose solution is u{k) = 0(fclog(fc)) (see, e.g., [CLRSOl]). A straightforward 
application of Theorem |4.1| yields 



V[T{k) > e{klog{k)) + tklogik)] < 10-\ (21) 

so that the runtime is tightly concentrated about its mean 0(fclog(A;)). 

The sampling complexity S{k) can be handled in an analogous manner, 
since in this case a{k) = Q{k) and m(fc) — k/10 as before. The associated 
deterministic recurrence becomes 

u{k) = eik)+u{k / 10), (22) 
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whose solution is u{k) = Q{k). Applying Theorem 4.1 again we have 



V[S{k) > Q{k) + tk] < 10~\ (23) 

so that we again have tight concentration of the number of samples around the 
mean 0(fc). 

5 Empirical Evaluation 

In this section we describe the results of an empirical evaluation of the adap- 
tive deterministic and Las Vegas variants of the Phaseshift algorithm described 
above. Both algorithms were implemented in C++ using FFTW 3.0 FJOS^ for 
the FFTs, using FFTW_ESTIMATE plans since the sample lengths are not known 
in advance for the Las Vegas variant. For comparison we also ran the same 
tests on the four variants of GFFT as well as on AAFFT and FFTW itself. 
The FFTW runs utilized the FFTW_PATIENT plans with wisdom enabled, and so 
are highly optimized. The experiments were run on a single core of an Intel 
Xeon E5620 CPU with a clock speed of 2.4 GHz and 24 GB of RAM, running 
SUSE Linux with kernel 2.6.16.60-0.81.2-smp for x86_64. All code was compiled 
with the Intel compiler using the -fast optimization. As in |Iwellj . timing is 
reported in CPU ticks using the cycle. h file included with the source code for 
FFTW. 

In the following sections we refer to our algorithm as "Phaseshift", since 
by taking shifted time samples of the input signal we also shift the phase of 
the Fourier coefficients. To keep the plots readable, we only show data for the 
adaptive, deterministic variant of our algorithm; the other variants perform sim- 
ilarly The algorithms of jlwell] are denoted GFFT-XY, where X e {D,R} and 
Y e {F,S}. The D/R stands for deterministic or randomized, while the F/S 
stands for fast or slow. The fast variants use more samples but less runtime 
while the slow variants use fewer samples but more runtime. In the plots be- 
low, we always show the GFFT variant with the most favorable sampling or 
runtime complexity. Finally, AAFFT denotes the algorithm of |GMS05j . The 
implementations tested are summarized in table [T] along with the average-case 
sampling and runtime complexities, and the associated references. 



5.1 Setup 

Each data point in Fig. [l||2] is the average of 100 independent trials of the 
associated algorithm for the given values of the bandwidth N and the sparsity 
k. The lower and upper bars associated with each data point represent the 
minimum and maximum number of samples or runtime of the algorithm over 
the 100 test functions. The values of k tested were 2,4, 8, . . . ,4096, while the 
values of N were 2^'', 2^^, . . . , 2^^. For larger values of k, the slow GFFT variants 
and AAFFT took too long to complete on our hardware, so we only present 
partial data for these algorithms. Nevertheless, the trend seen in the plots 
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Table 1: Implementations used in the empirical evaluation. 



Algorithm 


R/D 


Samples 


Runtime 


Reference 


PS-Dct 


D 


k 


k log k 


Section 


4 


PS-LV 


R 
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k log k 


Section 




GFFT-DF 


D 


k'^ \og^ N 


/fc2 log^ N 


jlwell 
















GFFT-DS 


D 


k'^ log^ N 


Nk log^ N 


|Iwell 




GFFT-RF 


R 


k \og^ N 


k log^ N 


|Iwell 




GFFT-RS 


R 


k log^ N 


NlogN 


llwell 




AAFFT 


R 


klog^N 


k log" N 


[GMS05] 


FFTW 


D 


N 


NlogN 


,FJ05j 





below continues for higher values of the sparsity. The test signals were generated 
according to the signal model described in section [4?2| 

The Phaseshift and deterministic GFFT variants will always recover such 
signals exactly. The randomized GFFT variants are Monte Carlo algorithms, 
and so, when they succeed, will also recover the signal exactly. AAFFT, on the 
other hand, is an approximation algorithm which will fail on a non-negligible 
set of input signals. However, for the runs depicted in Fig. [l}|2) AAFFT always 
produced an answer with £2 error less than 10^*. The randomized GFFT vari- 
ants failed a total of 7 times out of 2200 test signals, a relatively small amount 
that can be reduced by parameter tuning. For the Phaseshift variants, we chose 
the parameters ci =5, C2 = 10, and took the shift e to be 1/2N. Finally, for 
the randomized GFFT variants, we chose the Monte Carlo parameter to be 1.2. 

5.2 Sampling Complexity 

In Fig. [1] (a), we compare the average number of samples of the input signal S 
required by each algorithm when the bandwidth N fixed at 2^^. The sparsity 
of the test signal is varied from 2 to 4096 by powers of two. We can see that 
the Phaseshift variants require over an order of magnitude fewer samples than 
GFFT-RS, the GFFT variant with the lowest sampling requirements. Both 
Phaseshift variants also require over an order of magnitude fewer samples than 
AAFFT. The comparison with the deterministic GFFT variants is even starker; 
Phaseshift-Det requires two orders of magnitude fewer samples than GFFT-DS 
(not shown), and four orders of magnitude fewer samples than GFFT-DF (not 
shown) . 

In Fig. [2] (a), we compare the average number of samples of the input signal 
S required by each algorithm when the sparsity k is fixed at 60. The bandwidth 
N was varied from 2^'^-2^^ by powers of two. Using powers of two for the 
bandwidth allows the best performance for both FFTW and AAFFT, though 
this fact is more relevant for the runtime comparisons in the following section. 
We can see that the Phaseshift variants require many fewer samples than all 
four GFFT variants as well as AAFFT and FFTW, for all values of TV tested. 
The Phaseshift variants exhibit almost no dependence on the bandwidth for all 
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values of A'', a feature not shared by the other deterministic algorithms. We note 
here that in future work we plan to replace the 1 /2N shift by two or more larger 
shifts with CO- prime denominators to obtain an equivalent shift, as in |WZ98j . 
This should lead to more robustness at high values of N . 

5.3 Runtime Complexity 

In Fig. [l] (b) , we compare the average runtime of each algorithm over 100 test 
signals when the bandwidth N is fixed at 2^^ . The range of sparsity k considered 
is the same as in section 15.21 For all values of k the Phaseshift variants are 
faster than GFFT-RF (the fastest GFFT variant) and AAFFT by more than 
an order of magnitude. When compared to GFFT-RS (not shown), GFFT-DS 
(not shown), and FFTW, the difference in runtime is closer to three orders of 
magnitude. 

In Fig.[2](b), we compare the average runtime of each algorithm over 100 test 
signals when the sparsity k is fixed at 60. The range of bandwidth considered 
is the same as in section [5?2| The Phaseshift variants are the only algorithms 
that outperform FFTW for all values of N tested. The other implementations 
tested only become competitive with the standard FFT for N > 2^°, while ours 
are faster even for modest N . 

5.4 Noisy Case 

We report here on a preliminary study of the performance of the deterministic 
algorithm in the presence of noise. Our noisy signals were of the same form as 
in the previous section, but with complex white gaussian noise of standard de- 
viation a added to each measurement. As described in section [3. 3 [ the simplest 
way to deal with low-level noise is to simply round the reconstructed frequencies 
to the nearest integer of the form apj + 6, where h = ui mod pj is the location 
of the peak in a length-pj DFT. This modification doesn't change the runtime 
or sampling complexity significantly, so in this section we focus on the error in 
the approximation as a function of the noise level a and the parameter ci. 

In the existing literature on the sparse Fourier transform, the ^2 norm is 
most often used to assess the quality of approximation. There are many reasons 
for this choice, with the the two most convincing perhaps being the complete- 
ness of the complex exponentials with respect to the £2 norm and Parseval's 
theorem. For certain applications, however, this choice of norm is inappropri- 
ate. For example, in wide-band spectral estimation and radar applications, one 
is interested in identifying a set of frequency intervals containing active Fourier 
modes. In this case, an estimate oj of the true frequency ui with |tD — a;| ^ A'' is 
useful, but unless uj = uj the €2 metric will report an 0(1) error. Furthermore, 
when considering non-periodic signals (equivalently, non-integer w's) the same 
precision problem appears when using the €2 metric. 

For these reasons, we propose measuring the approximation error of sparse 
Fourier transform problems with the Earth Mover Distance (EMD) |RTG00] . 
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Figure 1: (a) Sampling complexity with fixed bandwidth N ~ 2^^ for PS-Det 
(blue solid line), GFFT-RS (red sohd hue), AAFFT (black dashed hne), and 
FFTW (magenta dashed line), (b) Runtime complexity with fixed bandwidth 
N = f-Q^ PS-Det (blue sohd line), GFFT-RF (red solid hne), AAFFT (black 
dashed line), and FFTW (magenta dashed line). 



18 



Samples for Fixed Sparsity k = 60 




Runtime for Fixed Sparsity k = 60 




Figure 2: (a) Sampling complexity with fixed sparsity fc = 60 for PS-Det (blue 
solid line), GFFT-RS (red solid line), AAFFT (black dashed Hne), and FFTW 
(magenta dashed line), (b) Runtime complexity with fixed sparsity fc = 60 for 
PS-Det (blue solid line), GFFT-RF (red solid line), AAFFT (black dashed line), 
and FFTW (magenta dashed hne). 
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Originally developed in the context of content-based image retrieval, EMD mea- 
sures the minimum cost that must be paid (with a user-specified cost function) 
to transform one distribution of points into another. EMD can be calculated 
efficiently as the solution of a linear program corresponding to a certain fiow 
minimization problem. In our situation, we consider the cost to move a set 

of estimated Fourier modes and coefficients {{u!j,cz.)}''._^ to the true values 
{{'^iTCiUj)} under the cost function 

di((w,c.),(ii,cs);iV) + |c„ - cs|. (24) 

This choice of cost function strikes a balance between the fidelity of the frequency 
estimate (as a fraction of the bandwidth) and that of the coefficient estimate. 
We denote the EMD using di for the cost function by EMD(l) below. 

In figure [3] we report the average EMD(l) error over 100 test signals as a 
function of the input noise level a, for various choices of the parameter ci. In 
this experiment, the sparsity and bandwidth are fixed at fc = 64 and N = 
2^^, respectively. As expected, the error decreases as Ci increases, since the 
rounding procedure described in section [3. 3| is more likely to result in the true 
frequency. Moreover, the error increases linearly with the noise level, indicating 
the procedure's robustness in the presence of noise. 

We remark that in the noiseless case the choice ci = 5 was found to be 
sufficient, while figure [3] indicates that the much larger value Ci « 256 is neces- 
sary for good approximation in the EMD(l) metric. The larger sample lengths 
imply an increase in both the runtime and sampling complexity, and indicate 



that the rounding procedure of section 3.3 should be complemented by other 



modifications. This is the purpose of a second manuscript under preparation. 
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in which we combine the rounding procedure with the use of larger shifts ej in 
a multiscale approach to frequency estimation. 

6 Conclusion 

In this paper we have presented a deterministic and Las Vegas algorithm for 
the sparse Fourier transform problem that empirically outperform existing algo- 
rithms in average-case sampling and runtime complexity. While our worst-case 
bounds do not improve the asymptotic complexity, we are able to extend by an 
order of magnitude the range of sparsity for which our algorithm is faster than 
FFTW in the average case. The improved performance of our algorithm can 
be attributed to two major factors: adaptivity and ability to detect aliasing. 
In particular, we are able to extract more information from a small number of 
function samples by considering the phase of the DFT coefficients in addition 
to their magnitudes. This represents a significant improvement over the current 
state of the art for the sparse Fourier transform problem. 

We have developed a multircsolution approach to handle the noisy case, in 
which we learn the value of a frequency from most to least significant bit by 
increasing the size of the shift e. Finally, we are exploring the extension of these 
methods to handle non-integer frequencies, which would represent the first such 
result in the sparse Fourier transform context. 
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